October 2018
This replication package contains data and code to replicate the analyses presented in "Misdemeanor Disenfranchisement? The demobilizing effects of brief jail spells on potential voters" and its online appendix.
Contact Ariel White (arwhi@mit.edu) with questions. 

This replication package includes merged and deidentified datasets that allow for the reproduction of most of the analyses presented in the main paper and SI.  However, I have not included the raw court records data out of concern that circulating criminal-history data could eventually make life more difficult for some of the people named in those records (especially if they manage to have their records expunged and my dataset does not reflect that).  Instead, I include the scripts I used to get from the raw data to the merged dataset included here, so others can judge the merging process.  Anyone particularly interested in reproducing that part of the project can request the court records data from the Harris County District Clerk's office (they call it "historical criminal records data").  

DATASETS:
- "defendants_voter1_deidentified.Rdata" is the main replication dataset used to reproduce most of the tables and figures from this project. 

- "defendants_voter1_manyyears_deidentified.Rdata" is similar to the main replication dataset, but contains cases for more years (since 2000) and fewer merged-in covariates. 

- "defendants_join2d_deidentified.Rdata" is similar to the main replication dataset, but incorporates neighborhood poverty rate for people whose addresses could be geocoded.

- "defendants_voter1recentcases_deidentified.Rdata" is analogous to the main replication dataset, but incorporates more recently-acquired case records from 2012-2016 for additional analyses. 

- "casetimingset_deidentified.Rdata" is a dataset used by "Harrisco_individualcasetiming.R" to explore a different identification strategy (for Section 1.6 of the SI). 

SCRIPTS:
- Harris_defendantfileprep_allyears.R takes the raw Harris County case data and generates a dataset of people facing their first misdemeanor cases during the period studied in this paper. As noted above, this script cannot be run without acquiring the criminal case records.

- parsedefendantnames.py parses the "def_nam" field of the court records into first, last, and middle names (and prefix/suffixes) in preparation for the merge to the voter file. As noted above, this script cannot be run without acquiring the criminal case records.

- Harris_voterfilemerge.R merges the file created by Harris_defendantfileprep.R with the Texas voter file (acquired from Nationbuilder in 2014; also not included here), yielding the partially-deidentified dataset that is provided with this replication package. It also produces Figure A3 and Table A8 for the SI. As noted above, this script cannot be run without acquiring the criminal case records as well as the voter file.

- Harris_mainanalysisreplication.R is the main analysis script. This script takes in "defendants_voter1_deidentified.Rdata" and several of the supplementary datasets, and replicates most of the tables and figures in the main paper and SI. It also outputs data that is then used by a Stata script to explore additional IV approaches.

- stata_LIML_replication.do takes in the dataset output by "Harris_mainanalysisreplication.R" and makes the table of alternative IV estimators in SI section 3.3.2.

- Harrisco_replication_newercases.R pulls in "defendants_voter1recentcases_deidentified.Rdata" and generates the recent-cases analyses presented in Section 7 of the SI. The commented code at the top of the script shows how the deidentified dataset was created.  

- Harrisco_individualcasetiming.R pulls in "casetimingset_deidentified.Rdata" to make the plots in Figure A1 in the SI. 

